Text Chunking based on a Generalization of Winnow

نویسندگان

Tong Zhang

Fred J. Damerau

David Johnson

چکیده

This paper describes a text chunking system based on a generalization of the Winnow algorithm. We propose a general statistical model for text chunking which we then convert into a classification problem. We argue that the Winnow family of algorithms is particularly suitable for solving classification problems arising from NLP applications, due to their robustness to irrelevant features. However in theory, Winnow may not converge for linearly non-separable data. To remedy this problem, we employ a generalization of the original Winnow method. An additional advantage of the new algorithm is that it provides reliable confidence estimates for its classification predictions. This property is required in our statistical modeling approach. We show that our system achieves state of the art performance in text chunking with less computational cost then previous systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Chunking using Regularized Winnow

Many machine learning methods have recently been applied to natural language processing tasks. Among them, the Winnow algorithm has been argued to be particularly suitable for NLP problems, due to its robustness to irrelevant features. However in theory, Winnow may not converge for nonseparable data. To remedy this problem, a modification called regularized Winnow has been proposed. In this pap...

متن کامل

A SNoW Based Supertagger with Application to NP Chunking

Supertagging is the tagging process of assigning the correct elementary tree of LTAG, or the correct supertag, to each word of an input sentence1 . In this paper we propose to use supertags to expose syntactic dependencies which are unavailable with POS tags. We first propose a novel method of applying Sparse Network of Winnow (SNoW) to sequential models. Then we use it to construct a supertagg...

متن کامل

Rapid Development of Nlp Modules with Memory-based Learning

The need for software modules performing natural language processing (NLP) tasks is growing. These modules should perform efficiently and accurately, while at the same time rapid development is often mandatory. Recent work has indicated that machine learning techniques in general, and memory-based learning (MBL) in particular, offer the tools to meet both ends. We present examples of modules tr...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

Text Classification in Information Retrieval using Winnow

Text classification in Information Retrieval can be done by using a linear classifier. Linear learning algorithms classify documents by learning a linear separator based on the document features. Littlestone's Winnow is such as linear learning algorithm. I have described three learning algorithms, based on Littlestone's Winnow, which can be applied to perform this task. Modifications of the alg...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 2 شماره

صفحات -

تاریخ انتشار 2002

Text Chunking based on a Generalization of Winnow

نویسندگان

چکیده

منابع مشابه

Text Chunking using Regularized Winnow

A SNoW Based Supertagger with Application to NP Chunking

Rapid Development of Nlp Modules with Memory-based Learning

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text Classification in Information Retrieval using Winnow

عنوان ژورنال:

اشتراک گذاری